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Summary: MDTraj is a modern, lightweight and efficient software package for analyzing molecular dynam- 
ics simulations. MDTraj reads trajectory data from a wide variety of commonly used formats. It provides 
a large number of trajectory analysis capabilities including RMSD, DSSP secondary structure assignment 
and the extraction of common order parameters. The package has a strong focus on interoperability with the 
wider scientific Python ecosystem, bridging the gap between molecular dynamics data and the rapidly-growing 
collection of industry-standard statistical analysis and visualization tools in Python. 

Availability: Package downloads, detailed examples and full documentation are available at 
http://mdtraj.org. The source code is distributed under the GNU Lesser General Public License at 
https : //github . com/ simtk/mdtraj . 



I. INTRODUCTION 

Molecular dynamics (MD) simulations yield a great 
deal of information about the structure, dynamics and 
function of biological macromolecules by modeling the 
physical interactions between their atomic constituents. 
Modern MD simulations, often using distributed comput- 
ing, graphics processing unit (GPU) acceleration, or spe- 
cialized hardware can generate large datasets containing 
hundreds of gigabytes or more of trajectory data track- 
ing the positions of a system's atoms over time. In or- 
der to use these vast and information-rich datasets to 
understand biomolecular systems and generate scientific 
insight, further computation, analysis and visualization 
is required 1 . 

Within the last decade, the Python language has 
become a major hub for scientific computing, with a 
wealth of high-quality open source packages, including 
those for interactive computing 2 , machine learning 3 and 
visualization 4 . The environment is ideal for both rapid 
development and high performance, as computational 
kernels can be implemented in C and FORTRAN but 
available within a user-friendly environment. 

In the MD community, the benefits of integration with 
such industry standard tools has not yet been fully re- 
alized because of a tradition of custom file formats and 
command-line analysis 5,6 . In order to bridge this gap, we 
have developed MDTraj , a modern, open and lightweight 
Python library for analysis and manipulation of MD tra- 
jectories with the following goals: 

1. To serve as a bridge between MD data and the mod- 
ern statistical analysis and scientific visualization 
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software ecosystem in Python. 

2. To support a wide set of MD data formats and com- 
putations. 

3. To run extremely rapidly on modern hardware with 
efficient memory utilization, enabling the analysis 
of large datasets. 

II. CAPABILITIES AND IMPLEMENTATION 

Wide range of data formats: MDTraj can read and 
write from a wide range of data formats in use within 
the MD community, including RCSB pdb, GROMACS 
xtc and trr, CHARMM / NAMD / OpenMM ded, TIN- 
KER arc, AMBER NetCDF, binpos, mderd and prmtop 
files. This wide support enables consistent interfaces and 
reproducible analyses regardless of users' preferred MD 
simulation packages. 

Easy featurization: Many data-analysis methods for 
MD involve either (a) extracting a vector of order param- 
eters of each simulation snapshot or (b) defining a dis- 
tance metric between snapshots. This category includes 
dimensionality reduction techniques such as principal 
components analysis (PCA) for constructing free-energy 
landscapes, as well probabilistic models like Markov state 
models. 

MDTraj makes it very easy to rapidly extract these 
representations. It includes an extremely fast minimal 
root mean squared deviation (RMSD) engine capable 
of operating near the machine floating point limit de- 
scribed by 7 . Functions for DSSP secondary-structure 
assignment 8 , solvent accessible surface area determina- 
tion and the extraction of internal degrees of freedom are 
similarly optimized in C with extensive use of vectorized 
intrinsics. 
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import mdtraj as md 

t = md. load (' trajectory . pdb ' ) 

from itertools import combinations 

pairs = combinations ( range ( t . n_atoms ) , 2) 

X = md. compute_distances(t, pairs) 

import matplotlib . pyplot as pit 
from sklcarn . decomposition import PCA 
pea = PCA( n_components=2) 
Y = pea . fit_transform (X) 

pit . hexbin(Y[: , 0],Y[:, 1], bins='log') 
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Figurel: Demonstration of principal components 
analysis (PCA) with MDTraj, scikit-learn and 
matplotlib. 




Figure2: MDTraj 's WebGL-based protein and 
trajectory viewer. 



Interactive visualization: These fast computational 
routines make MDTraj ideal for interactive calculation 
and exploratory analysis, using the extensive machine 
learning, statistics and visualization packages in the sci- 
entific python community. Furthermore, MDTraj in- 
cludes an interactive WebGL 3D protein viewer in the 
IPython notebook based on iview 9 , shown in Fig. 2. 

The capabilities of MDTraj serve as a bridge, connect- 
ing MD data with statistics and graphics libraries devel- 
oped for general data science audiences. The key advan- 
tage of this design, for users, is access to a much wider 
range of state-of-the-art analysis capabilities character- 
ized by large feature sets, extensive documentation and 
active user communities. 

A demonstration of this integrative workflow is 



shown in Fig. 1, which combines MDTraj with the 
scikit-learn package for PCA and matplotlib for vi- 
sualization, to determine high- variance collective motions 
in a protein system. While PCA is a widely used method 
that is included in a variety of MD analysis packages, 
the advantage of integrating with the wider data science 
community is immediately evident when moving on to 
more complex statistical analysis. For example, a va- 
riety of sparse and kernelized PCA-like methods have 
been recently introduced in the machine learning com- 
munity, and may be quite powerful for analyzing more 
complex protein systems. Because of its open and inter- 
operable design, these cutting-edge statistical tools are 
readily available to MD researchers with MDTraj, with- 
out duplication of developer efforts and independent of 
the particular MD software used to perform the simula- 
tions. 



III. TESTING AND DEVELOPMENT 

The development and engineering of MDTraj incorpo- 
rates modern best practices for scientific computing 10 , 
and contains over 900 tests for individual components. 
These tests are continually run on each incremental con- 
tribution on both Windows and Linux platforms, us- 
ing multiple versions of Python and the required li- 
braries. The project is licensed under the GNU Lesser 
General Public License, and its design and development 
takes place openly on Github at https://github.com/ 
simtk/mdtraj . More information is available at http: 
//mdtraj . org. 
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